Skip to content

This Python 3 module provides utility functions for formatting fixed-width CJK strings.

License

Notifications You must be signed in to change notification settings

HuidaeCho/cjkformat

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CJK Format

This Python 3 module provides utility functions for formatting fixed-width CJK strings. Functionally, it is similar to the libaprintf (aligned printf) C library.

파이썬 3에서 한글 print 포맷 정렬하기

Installation

This module is published in PyPI, so you can install it using pip3:

pip3 install --user cjkformat

Yeah, there is no version 0.1.0 there. I had to make a small fix, but I didn't know that the same version is never allowed again even after the project is deleted. I see one typo there in README.md, but who cares!

Introduction

One of the most important features of this module is to align Latin and CJK characters using %{width}s specifiers. For example, you may expect the following lines

print('%-10s|%-10s|' % ('ab', 'cd'))
print('%-10s|%-10s|' % ('가나다라', '마바사아'))

to produce

Aligned output

because both lines use the same-width string specifiers (%10s). However, this code snippet will produce

Misaligned output

even though the pipe characters are vertically (horizontally?) aligned between the two print lines. This misalignment occurs because wide CJK characters are considered one character (e.g, len('가')=1) even though they take up two column spaces. To resolve this alignment issue, the width component of the string specifier needs to be adjusted using the number of actual CJK characters in the argument string. The above example can be fixed by reducing the width 10 to 10 - 4 (four CJK characters in each %10s) as follows:

print('%-10s|%-10s|' % ('ab', 'cd'))
print('%-6s|%-6s|' % ('가나다라', '마바사아'))

which will print nicely aligned

Aligned output

However, it would be very cumbersome to count CJK characters and adjust widths in the format string every time we use CJK characters. I found some solutions from these articles:

but I was not happy with any of those solutions because they deviated too much from the usage of print() and disrupt regular patterns of print(). It would also be hard to justify use of those functions to non-CJK developers when they have very different calling patterns.

In this module, I tried to mimic the usage of print() as much as possible. Since the % operator is interpreted first before being passed to any functions, I was not able to use exactly the same syntax as print(). Instead, I tried to keep the name of the core function short and simple. Indeed, its name is simply f, which is the same as the prefix for the f-string syntax. I also defined printf() that behaves much like the printf() function in the C language. This function combines print(, end='') and f() to not add a newline.

Now, using the new f() function, the above example would be

from cjkformat import f

print(f('%-10s|%-10s|', 'ab', 'cd'))
print(f('%-10s|%-10s|', '가나다라', '마바사아'))

which will print

Aligned output

Note that the f() function takes both the format and arguments. Also, unlike print(), it takes a variable number of arguments instead of a list of arguments.

Equivalently, using printf(),

from cjkformat import printf

printf('%-10s|%-10s|\n', 'ab', 'cd')
printf('%-10s|%-10s|\n', '가나다라', '마바사아')

will produce the same output:

Aligned output

License

Copyright (C) 2020, Huidae Cho <https://idea.isnew.info>

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see <http://www.gnu.org/licenses/>.

About

This Python 3 module provides utility functions for formatting fixed-width CJK strings.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages