Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different languages cause layout changes. #60

Closed
Neillife opened this issue Dec 10, 2022 · 10 comments · Fixed by #63
Closed

Different languages cause layout changes. #60

Neillife opened this issue Dec 10, 2022 · 10 comments · Fixed by #63
Labels
enhancement New feature or request

Comments

@Neillife
Copy link

There will be a layout offset problem when using different languages ​​for output.

Issues

The complete code is as follows.

from table2ascii import table2ascii as t2a
from table2ascii import PresetStyle
from table2ascii import Alignment

output = t2a(
    header=["日期", "test"],
    body=[["2022/12/11", "test"], ["2022/1/1", "測試"]],
    cell_padding=5,
    style=PresetStyle.double_thin_compact,
    alignments=[Alignment.CENTER] * 2
)

print(output)

Is there any solution?

@DenverCoder1
Copy link
Owner

That is caused by the font not having monospace versions of those characters. It is similar when dealing with emoji.

Unfortunately, there is nothing that the library can do about that since it isn't possible to know how wide the variable width characters will appear in the font it will be displayed in and even if it could know, it wouldn't be able to line it up perfectly.

The library places the correct number of spaces, so the best you can do is install a different font for it to use, or manually adjust the widths using obscure varying-width whitespace characters and make it so it is aligned when using that particular font.

See also #32

@DenverCoder1 DenverCoder1 added the question Further information is requested label Dec 11, 2022
@Neillife
Copy link
Author

I use unicode in the string to judge different languages to deal with the problem of layout offset.

Add the same number of \u200b (Zero width space) string lengths in different languages.
Maybe not the best solution but it can solve the layout offset problem.

The complete code is as follows.

from table2ascii import table2ascii as t2a
from table2ascii import PresetStyle
from table2ascii import Alignment

def handle_layout_offset(list):
    unicode = ""
    if u'\u4e00' <= list <= u'\u9fa5':
        for i in list:
            unicode += u"\u200b"
    
    return unicode

headerList = ["日期", "test"]
bodyList = [["2022/12/11", "test"], ["2022/1/1", "測試"]]

for i in range(len(headerList)):
    headerList[i] += handle_layout_offset(headerList[i])    

for body in bodyList:
    for i in range(len(body)):
        body[i] += handle_layout_offset(body[i])

output = t2a(
    header=headerList,
    body=bodyList,
    cell_padding=5,
    style=PresetStyle.double_thin_compact,
    alignments=[Alignment.CENTER] * 2
)

print(output)

Output result:
ouput

@DenverCoder1
Copy link
Owner

That is interesting that it appears the characters are exactly double width.

I still think it will depend on the font and program used to render it, so if there is an internal fix, it should probably be an opt-in, toggle-able flag that would count potential double-width characters as 2 when determining the length.

The zero-width space solution does seem to be a good workaround for an external solution, though.

@DenverCoder1 DenverCoder1 added enhancement New feature or request and removed question Further information is requested labels Dec 11, 2022
@DenverCoder1
Copy link
Owner

Some relevant links that could help with this:

https://unix.stackexchange.com/questions/245013/get-the-display-width-of-a-string-of-characters

Python API - https://pypi.org/project/wcwidth/

@DenverCoder1
Copy link
Owner

If you have a chance, let me know what you think of the proposed solution in #63

@DenverCoder1
Copy link
Owner

DenverCoder1 commented Dec 11, 2022

My current thinking is we can add the flag to toggle the feature, but default it to True, making it a major release (1.0.0) but also possible to revert to the old way.

@Neillife
Copy link
Author

Neillife commented Dec 12, 2022

Some relevant links that could help with this:

https://unix.stackexchange.com/questions/245013/get-the-display-width-of-a-string-of-characters

Python API - https://pypi.org/project/wcwidth/

Used to calculate the length of the unicode string, maybe the keywords I entered in the search engine were not precise enough.
wcwidth is exactly what I'm looking for in the python library.

If you have a chance, let me know what you think of the proposed solution in #63

I'm wondering if it can handle all languages as well as emoji, special symbols?
This would be the perfect solution if possible!
I'm looking forward to your next table2ascii release.

My current thinking is we can add the flag to toggle the feature, but default it to True, making it a major release (1.0.0) but also possible to revert to the old way.

The use_wcwidth flag is preset to true if it does not affect the output of the original version, I think it can be preset to true.

@DenverCoder1
Copy link
Owner

I'm wondering if it can handle all languages as well as emoji, special symbols?

If the font used for displaying the characters makes them exactly 0, 1, or 2 characters wide, it should also fix those. Otherwise, it may still be slightly off.

For the most part, it seems pretty good, at least in my terminal.

image

Sites such as GitHub and Discord seem to still not line up the Chinese characters exactly due to the font.

+----+----+----+----+----+
|  ​  | 🦁 | 🦡 | 🦅 | 🐍 |
+----+----+----+----+----+
| 💻 | ✅ | ✅ | ❌ | ❌ |
+----+----+----+----+----+
| 📅 | ✅ | ❌ | ✅ | ❌ |
+----+----+----+----+----+
| 🥞 | 日 | 月 | 火 | 水 |
+----+----+----+----+----+

If it does not affect the output of the original version, I think it can be preset to true.

Yeah, nearly all cases the output will be the same as it used to. Even with your zero width space workaround, it should still be fine since the zero width spaces will be counted as 0 when the other characters are counted as 2.

The output does change in some cases, although it seems like in nearly all cases, it makes it better.

@Neillife
Copy link
Author

Yeah, nearly all cases the output will be the same as it used to. Even with your zero width space workaround, it should still be fine since the zero width spaces will be counted as 0 when the other characters are counted as 2.

The output does change in some cases, although it seems like in nearly all cases, it makes it better.

Yeah, this really makes it even better!
I think the width of the table can be calculated more accurately.
Maybe this solution can reduce the exception that a lot of characters cause the layout to shift.

image

+----+----+----+----+----+
|  ​  | 🦁 | 🦡 | 🦅 | 🐍 |
+----+----+----+----+----+
| 💻 | ✅ | ✅ | ❌ | ❌ |
+----+----+----+----+----+
| 📅 | ✅ | ❌ | ✅ | ❌ |
+----+----+----+----+----+
| 🥞 | 日 | 月 | 火 | 水 |
+----+----+----+----+----+

It looks like Terminal likes this scheme, but sites like GitHub and Discord don't. 😮

@DenverCoder1
Copy link
Owner

This feature is now released in version 1.0.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants