Different languages cause layout changes. #60

Neillife · 2022-12-10T19:30:41Z

There will be a layout offset problem when using different languages for output.

The complete code is as follows.

from table2ascii import table2ascii as t2a
from table2ascii import PresetStyle
from table2ascii import Alignment

output = t2a(
    header=["日期", "test"],
    body=[["2022/12/11", "test"], ["2022/1/1", "測試"]],
    cell_padding=5,
    style=PresetStyle.double_thin_compact,
    alignments=[Alignment.CENTER] * 2
)

print(output)

Is there any solution?

DenverCoder1 · 2022-12-11T01:19:50Z

That is caused by the font not having monospace versions of those characters. It is similar when dealing with emoji.

Unfortunately, there is nothing that the library can do about that since it isn't possible to know how wide the variable width characters will appear in the font it will be displayed in and even if it could know, it wouldn't be able to line it up perfectly.

The library places the correct number of spaces, so the best you can do is install a different font for it to use, or manually adjust the widths using obscure varying-width whitespace characters and make it so it is aligned when using that particular font.

See also #32

Neillife · 2022-12-11T10:02:43Z

I use unicode in the string to judge different languages to deal with the problem of layout offset.

Add the same number of \u200b (Zero width space) string lengths in different languages.
Maybe not the best solution but it can solve the layout offset problem.

The complete code is as follows.

from table2ascii import table2ascii as t2a
from table2ascii import PresetStyle
from table2ascii import Alignment

def handle_layout_offset(list):
    unicode = ""
    if u'\u4e00' <= list <= u'\u9fa5':
        for i in list:
            unicode += u"\u200b"
    
    return unicode

headerList = ["日期", "test"]
bodyList = [["2022/12/11", "test"], ["2022/1/1", "測試"]]

for i in range(len(headerList)):
    headerList[i] += handle_layout_offset(headerList[i])    

for body in bodyList:
    for i in range(len(body)):
        body[i] += handle_layout_offset(body[i])

output = t2a(
    header=headerList,
    body=bodyList,
    cell_padding=5,
    style=PresetStyle.double_thin_compact,
    alignments=[Alignment.CENTER] * 2
)

print(output)

Output result:

DenverCoder1 · 2022-12-11T15:34:34Z

That is interesting that it appears the characters are exactly double width.

I still think it will depend on the font and program used to render it, so if there is an internal fix, it should probably be an opt-in, toggle-able flag that would count potential double-width characters as 2 when determining the length.

The zero-width space solution does seem to be a good workaround for an external solution, though.

DenverCoder1 · 2022-12-11T18:44:49Z

Some relevant links that could help with this:

https://unix.stackexchange.com/questions/245013/get-the-display-width-of-a-string-of-characters

Python API - https://pypi.org/project/wcwidth/

DenverCoder1 · 2022-12-11T19:38:23Z

If you have a chance, let me know what you think of the proposed solution in #63

DenverCoder1 · 2022-12-11T19:41:37Z

My current thinking is we can add the flag to toggle the feature, but default it to True, making it a major release (1.0.0) but also possible to revert to the old way.

Neillife · 2022-12-12T10:35:55Z

Some relevant links that could help with this:

https://unix.stackexchange.com/questions/245013/get-the-display-width-of-a-string-of-characters

Python API - https://pypi.org/project/wcwidth/

Used to calculate the length of the unicode string, maybe the keywords I entered in the search engine were not precise enough.
wcwidth is exactly what I'm looking for in the python library.

If you have a chance, let me know what you think of the proposed solution in #63

I'm wondering if it can handle all languages as well as emoji, special symbols?
This would be the perfect solution if possible!
I'm looking forward to your next table2ascii release.

My current thinking is we can add the flag to toggle the feature, but default it to True, making it a major release (1.0.0) but also possible to revert to the old way.

The use_wcwidth flag is preset to true if it does not affect the output of the original version, I think it can be preset to true.

DenverCoder1 · 2022-12-12T15:04:52Z

I'm wondering if it can handle all languages as well as emoji, special symbols?

If the font used for displaying the characters makes them exactly 0, 1, or 2 characters wide, it should also fix those. Otherwise, it may still be slightly off.

For the most part, it seems pretty good, at least in my terminal.

Sites such as GitHub and Discord seem to still not line up the Chinese characters exactly due to the font.

+----+----+----+----+----+
|    | 🦁 | 🦡 | 🦅 | 🐍 |
+----+----+----+----+----+
| 💻 | ✅ | ✅ | ❌ | ❌ |
+----+----+----+----+----+
| 📅 | ✅ | ❌ | ✅ | ❌ |
+----+----+----+----+----+
| 🥞 | 日 | 月 | 火 | 水 |
+----+----+----+----+----+

If it does not affect the output of the original version, I think it can be preset to true.

Yeah, nearly all cases the output will be the same as it used to. Even with your zero width space workaround, it should still be fine since the zero width spaces will be counted as 0 when the other characters are counted as 2.

The output does change in some cases, although it seems like in nearly all cases, it makes it better.

Neillife · 2022-12-13T06:57:56Z

Yeah, nearly all cases the output will be the same as it used to. Even with your zero width space workaround, it should still be fine since the zero width spaces will be counted as 0 when the other characters are counted as 2.

The output does change in some cases, although it seems like in nearly all cases, it makes it better.

Yeah, this really makes it even better!
I think the width of the table can be calculated more accurately.
Maybe this solution can reduce the exception that a lot of characters cause the layout to shift.

+----+----+----+----+----+
|    | 🦁 | 🦡 | 🦅 | 🐍 |
+----+----+----+----+----+
| 💻 | ✅ | ✅ | ❌ | ❌ |
+----+----+----+----+----+
| 📅 | ✅ | ❌ | ✅ | ❌ |
+----+----+----+----+----+
| 🥞 | 日 | 月 | 火 | 水 |
+----+----+----+----+----+

It looks like Terminal likes this scheme, but sites like GitHub and Discord don't. 😮

DenverCoder1 · 2022-12-14T22:58:26Z

This feature is now released in version 1.0.1

DenverCoder1 added the question Further information is requested label Dec 11, 2022

DenverCoder1 added enhancement New feature or request and removed question Further information is requested labels Dec 11, 2022

DenverCoder1 mentioned this issue Dec 11, 2022

feat!: Add use_wcwidth for Asian character support #63

Merged

DenverCoder1 closed this as completed in #63 Dec 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Different languages cause layout changes. #60

Different languages cause layout changes. #60

Neillife commented Dec 10, 2022

DenverCoder1 commented Dec 11, 2022

Neillife commented Dec 11, 2022

DenverCoder1 commented Dec 11, 2022

DenverCoder1 commented Dec 11, 2022

DenverCoder1 commented Dec 11, 2022

DenverCoder1 commented Dec 11, 2022 •

edited

Neillife commented Dec 12, 2022 •

edited

DenverCoder1 commented Dec 12, 2022

Neillife commented Dec 13, 2022

DenverCoder1 commented Dec 14, 2022

Different languages cause layout changes. #60

Different languages cause layout changes. #60

Comments

Neillife commented Dec 10, 2022

DenverCoder1 commented Dec 11, 2022

Neillife commented Dec 11, 2022

DenverCoder1 commented Dec 11, 2022

DenverCoder1 commented Dec 11, 2022

DenverCoder1 commented Dec 11, 2022

DenverCoder1 commented Dec 11, 2022 • edited

Neillife commented Dec 12, 2022 • edited

DenverCoder1 commented Dec 12, 2022

Neillife commented Dec 13, 2022

DenverCoder1 commented Dec 14, 2022

DenverCoder1 commented Dec 11, 2022 •

edited

Neillife commented Dec 12, 2022 •

edited