Formatter: Unicode width is too high #6499

konstin · 2023-08-11T11:45:22Z

The following snippet fits visually in pycharm and is kept in this layout by black

function_call(
    "aaaaaaaaaaaaaaaaaaaa", "我隻氣墊船裝滿晒鱔.txt", "मेरी मँडराने वाली नाव सर्पमीनों से भरी ह"
)

ruff formats it as

function_call(
    "aaaaaaaaaaaaaaaaaaaa",
    "我隻氣墊船裝滿晒鱔.txt",
    "मेरी मँडराने वाली नाव सर्पमीनों से भरी ह",
)

We likely compute the unicode width of the string too high so that we assume the line is too long

MichaReiser · 2023-08-11T12:09:01Z

How did you measure the width in Pycharm? I understand that Pycharms cursor position (row:column) is character based and not width based.

konstin · 2023-08-11T13:47:31Z

Really just visually:

The chinese characters line up with two latin characters, the hindi doesn't.

I don't know by which rules which tool measures, but we should try to match what black does

MichaReiser · 2023-08-11T14:03:44Z

Black migrates to use unicode-width in preview style, but only for strings (not identifiers).

Edit: But we should look into this.

konstin · 2023-08-11T15:25:22Z

black does also break in preview mode! I guess it makes more sense by the unicode text width algorithm but it's also odd because it's different from the actual text rendering

MichaReiser · 2023-08-22T14:52:56Z

@konstin and I talked about this and we must admit, we're confused. I will close this issue because I believe we're doing the right thing. Please open a new issue and link this issue if you're more familiar with CJK characters and/or have reasons to believe that the implementation is incorrect. We'll hopefully be able to figure out the correct behavior together. Reasons for closing:

According to @konstin. Our implementation matches black preview style, which is good
The Unicode spec recommends to tread the ambiguous characters as half-width when displaying but the context cannot be inferred (e.g. by the used font).

Ambiguous characters behave like wide or narrow characters depending on the context (language tag, script identification, associated font, source of data, or explicit markup; all can provide the context). If the context cannot be established reliably, they should be treated as narrow characters by default.
The formatter uses width which treads ambiguous characters as half-width.

I also went ahead and pasted the example into my IDE and it seems it exceeds the column width for my configuration (the xx... is exactly 88 characters wide)

konstin added bug Something isn't working formatter Related to the formatter labels Aug 11, 2023

konstin mentioned this issue Aug 11, 2023

📋 Black-compatible formatting of django #6069

Closed

20 tasks

MichaReiser added this to the Formatter: Alpha milestone Aug 17, 2023

konstin added needs-decision Awaiting a decision from a maintainer and removed bug Something isn't working labels Aug 21, 2023

MichaReiser assigned konstin Aug 22, 2023

konstin assigned MichaReiser and unassigned konstin Aug 22, 2023

MichaReiser closed this as completed Aug 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Formatter: Unicode width is too high #6499

Formatter: Unicode width is too high #6499

konstin commented Aug 11, 2023 •

edited

MichaReiser commented Aug 11, 2023

konstin commented Aug 11, 2023

MichaReiser commented Aug 11, 2023 •

edited

konstin commented Aug 11, 2023 •

edited

MichaReiser commented Aug 22, 2023 •

edited by konstin

Formatter: Unicode width is too high #6499

Formatter: Unicode width is too high #6499

Comments

konstin commented Aug 11, 2023 • edited

MichaReiser commented Aug 11, 2023

konstin commented Aug 11, 2023

MichaReiser commented Aug 11, 2023 • edited

konstin commented Aug 11, 2023 • edited

MichaReiser commented Aug 22, 2023 • edited by konstin

konstin commented Aug 11, 2023 •

edited

MichaReiser commented Aug 11, 2023 •

edited

konstin commented Aug 11, 2023 •

edited

MichaReiser commented Aug 22, 2023 •

edited by konstin