Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UTF-8 rendering woes #75

Closed
tycho opened this issue Apr 8, 2016 · 42 comments
Closed

UTF-8 rendering woes #75

tycho opened this issue Apr 8, 2016 · 42 comments
Assignees
Labels

Comments

@tycho
Copy link

tycho commented Apr 8, 2016

Examples below use the UTF-8 demo file.

Some of the rendering issues could be attributed to the font (Consolas), but some cannot.

Here's Consolas with MinTTY (Cygwin):
Consolas on MinTTY

And here's Consolas with "Bash on Windows":
Consolas on Bash

Consolas simply doesn't do well on the box drawing tests.

One of the best monospace fonts I've found is DejaVu Sans Mono. But cmd.exe's properties page doesn't allow me to select that font when it's installed. It has a static list of fonts that appear in the Windows Registry. In order to use fonts other than Lucida Console, Consolas, or raster fonts, I need to replace one of the fonts listed in the registry under HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Console\TrueTypeFont. In my case, I replaced Consolas with DejaVu Sans Mono for another test:

DejaVu Sans Mono with MinTTY (Cygwin):
DejaVu Sans Mono on MinTTY

DejaVu Sans Mono with "Bash on Windows":
DejaVu Sans Mono on Bash

Now the box drawing tests are fine, but there are numerous UTF-8 glyphs that are unavailable for use.

So the problems are:

  • Missing UTF-8 glyphs.
  • No custom font selection in cmd.exe properties.
  • No decent-quality font provided with Windows for box drawing or other UTF-8 characters.
@mobluse
Copy link

mobluse commented Apr 8, 2016

Thanks, I also added DejaVu Sans Mono to cmd.exe and Cygwin64 Terminal. Now also links http://www.fileformat.info/info/unicode/block/arrows/utf8test.htm works with UTF-8 -- press Esc to get a menu.

@mobluse
Copy link

mobluse commented May 15, 2016

The example below works better in WSL Terminal (i.e. WSLtty) than in Ubuntu from Store for WSL (using Cmd.exe). One can run it in two windows and compare. Some characters are missing in Cmd.exe.

sudo apt-get install toilet

ls /usr/share/figlet/ | sed 's/\..*//' | while read font; do toilet -F list | sed -n '/"/ {s/"\(.*\)".*/\1/;p}' | while read filter; do toilet -f $font -F $filter Orbin 2> /dev/null; done; done

@BobFrankston
Copy link

I too am seeing a problem with Unicode. I have a similar problem with Unicode with SSH but when I use Putty everything works fine.

There is one strange thing in this example. The string 方思腾.香港 rendered fine until I cursored over it and it didn't recover. This image shows the original version and the version after the cursor moved over it. Running emacs on in Putty on an Ubuntu system does not have the problem. I tried two different fonts and had th same problem (This also raises the question of why the command prompt doesn't to Unicode by default but that's a different topic.

untitled
problem

@zerocool4u2
Copy link

@BobFrGit that's because those characters aren't monospaced and cmd doesn't really support them(by the way, you can "set" a font to be type monospace and not have really any char on the same size, it's a font property), i have similar problems with glyphs, try to find a font with monospace ones for those unicode chars o you could try process that one with some python script, there is a project called powerline patched fonts(if i recall correctly) that have scripts that could help you

@BobFrankston
Copy link

The monospace assumption is interesting. I use Epsilon on Windows and when I go done it goes to the same nth character on a line. But using Emacs in Ubuntu when I do down it goes to the characters visually below. The question then is why does Emacs in Putty do it "right" or, at least, doesn't get confsued while using it in bash fails. If I use Emacs with SSH I get a different result -- substitution characters. (Same for DigitialOcean's own access tool)

In exploring Unicode I found that it can be far more complicated so I'm not trying to solve the general case -- just observing that PUTTY is an existence prove of a better approach.

@KindDragon
Copy link

Using ConEmu terminal can also help with this

@BobFrankston
Copy link

Thanks. For now epsilon and Putty work sufficiently well for me. Just wanted to flag the problem for now.

@zerocool4u2
Copy link

@BobFrGit i mean the kanjis o whatever they are, you can see that they are double spaced, so when you move the cursor over you can see that they are splitted in half and you see the part that would match if they where 1 width each, like... 1 2 3 456 it would show 1 243 456 if you put the cursor next to the 2 and 1,2 and 3 where double spaced, because next to 2 is the fourth position on monospace types

@BobFrankston
Copy link

(Actually they are hanzi 汉字 but don't worry about it.) As I mentioned above both Emacs on Ubuntu via Putty and Epsilon on the PC don't have the problem though they take different approaches in dealing with the fact that those characters are not monospace.

This is not a big deal for me now -- just wanted to flag it. One feature is that I found that if I change the CMD font the dir listing will show 汉字 file names properly.

@mrmckeb
Copy link

mrmckeb commented Mar 12, 2017

I'm not sure if this is related, but it seems to be - I'm finding a lot of characters/symbols aren't rendering as expected in Bash on Windows 10. Although not necessary, a lot of build tools use special characters to show status of tests, etc.

I understand emoji is a whole different issue... so not raising that here.

@BobFrankston
Copy link

Yeah -- been playing with the 32 bit Unicode and that's a challenge in its own right. As an FYI Word seems to do a pretty good job on Emojis and I discovered that Alt-x let's me enter them. (At least some -- when I tried to enter ancient Chinese rod number I didn't find a font that had them).

@whisust
Copy link

whisust commented Mar 13, 2017

Hey @mrmckeb same here, unable to use unicode emoji's / symbols...
I had a personalized ps1 display with git, using top and down arrows. They are only squares now u_u

@mrmckeb
Copy link

mrmckeb commented Mar 14, 2017

@antlatrille Similar to what I've seen. Hopefully we can get more support for this in future releases!

@Karasuni
Copy link

Karasuni commented May 25, 2017

Still encountering this issue using Bash on Windows over 1 year since the initial report. Is there any fix?

@bitcrazed
Copy link
Contributor

Hey all. It's important to note that Console not being able to display a given symbol or set of symbols is a many-sided-blade! ;)

Alas, because the Console's text renderer is GDI-based, we're unable to support features like font-fallback which would allow us to support fonts that contain a specific set of symbols (e.g. Emoji, Klingon), but gradually fall-back on a more expansive font sets for other chars.

We have a goal to replace our renderer with a more modern DirectWrite renderer at some point in the (increasingly near) future.

When we do, we'll be able to do A LOT of very cool, modern, fancy things with text that we're simply unable to do right now.

Bear with us ;)

@ronindesign
Copy link

Thanks for the update on this.

@fcharlie
Copy link

fcharlie commented Jun 7, 2017

@bitcrazed Use Direct2D rewrite Console ?
Please add D2D1_DRAW_TEXT_OPTIONS_ENABLE_COLOR_FONT to enable color font, thanks !!!

@BobFrankston
Copy link

A side effect of revisiting this thread is that I realize i can use escape sequences in NodeJS console.log. I presume the new capabilities will be available via escape sequences so they can be used without needing to update libraries to take advantage of the new features.

@dernyn
Copy link

dernyn commented Jun 17, 2017

It's not just bash, it's a windows problem it seems.....just tried the same fonts with notepad or wordpad.
It's the edit control, inherent to the GDI+, which has a problem with monospaced font rendering all over windows, 3rd party components not dependant on the edit control mechanism works fine.....firefox, chrome and mozilla rendering engines works perfect and so does the scintilla based editors, mintty comes from Putty and it works fine there too.

@hwaldstein
Copy link

It appears we've recently passed nine months since the last collaborator update, and this issue is still unresolved. Or, at least, I'm experiencing the same issues described above. Is there any news of progress on fixing this, or a more clear definition of what "(increasingly near) future" means? Any update would be greatly appreciated.

@jacoby
Copy link

jacoby commented Mar 5, 2018

I'm in agreement with @hwaldstein, but I have seen that unicode characters work using Hyper as the terminal for WSL instead of the default.

I'm not as happy with it's ANSI colors, but that's on Hyper, not WSL. Is there a better repo for this issue than WSL?

@BobFrankston
Copy link

There are rumors of new command processor and/or shell.. If so does it moot this and instead shift the focus to feature requests and betas?

@bitcrazed
Copy link
Contributor

@fcharlie - you can count on that :)

@bitcrazed
Copy link
Contributor

bitcrazed commented Mar 30, 2018

@BobFrGit Yes, our guidance (we'll be publishing some in the coming weeks) is to SetConsoleMode enabling ENABLE_VIRTUAL_TERMINAL_INPUT & use VT/ANSI escape sequences moving forward.

@bitcrazed
Copy link
Contributor

@dernyn - as I pointed out above, GDI based display tech struggles with several mechanisms (esp. font-fallback) that are essential for displaying complex modern glyphs, including ninjacat emoji 🐱‍👤.

In the future, we plan on replacing the Console's current GDI based renderer with a renderer that uses DirectWrite (directly or indirectly) which will eliminate almost all our rendering, and many of our internationalization issues in one fell swoop!

@bitcrazed
Copy link
Contributor

Hey @hwaldstein - thanks for your continued patience. While it may appear that we've been rather quiet over the last year or so, we've actually been cranking away, modernizing and overhauling much of the Console's internals, paving the way for us to start delivering user-visible improvements in future releases.

The 18H2 (2018, 2nd half) release that we're currently working on will deliver some pretty cool improvements, esp. for anyone building 3rd party terminals, and command-line shells, tools, and apps.

We have a long list of Console features queud up for subsequent OS releases too.

@bitcrazed
Copy link
Contributor

@jacoby - thanks for your patience; I also refer you to my reply to @hwaldstein above.

Re. repo choices: We'll be moving many of these Console related issues over to the new Console issues repo in the coming months - feel free to post new issues over there from now onwards though..

@bitcrazed
Copy link
Contributor

@BobFrGit - I am not aware of any new shell being created at Microsoft. We already have Cmd and PowerShell, and of course bash/zsh/fish/etc. in your favorite Linux distro(s) running atop WSL.

@fcharlie
Copy link

@bitcrazed I'm glad to see your decision, and I'm looking forward to the new console.

@jacoby
Copy link

jacoby commented Mar 30, 2018

@bitcrazed And of course the Cygwin-based Bash that is used in Git4Win, etc.

Can hardly wait for summer and the new Console. I like all about Hyper except the lag.

@bitcrazed
Copy link
Contributor

@jacoby - yes, but Cygwin isn't a Microsoft shell.

And to be clear, we're not shipping a "new" Console this summer - it's the same Console, with significantly improved internals, and several bug fixes and improvements.

@jacoby
Copy link

jacoby commented Mar 30, 2018

Gotcha.

@bardware
Copy link

I have a setting in my .vimrc file in msys2 that displays every TAB as ➪
I don't see that character in WSL/Ubuntu (from store)

image

image

@bitcrazed
Copy link
Contributor

@bardware VERY likely that code-point isn't included in your console's currently selected font. As mentioned above/elsewhere, Console renders using GDI which cannot perform font-fallback, so if your font doesn't contain the glyph for ➪ then we can only display the unprintable char glyph.

@bardware
Copy link

in your console's currently selected font

I played around a bit and tried some fonts alread, but I'll keep looking.
thanks for your reply.

@therealkenc
Copy link
Collaborator

I played around a bit and tried some fonts alread, but I'll keep looking.

Quoth from the top:

rendering issues could be attributed to the font (Consolas), but some cannot.

The ➪ glyph is in the "kinda not" category. So, don't burn too much time downloading every fixed width font you can find on the Interwebs. It isn't going to help (or call me 😮 if it does). Like Rich alludes a bunch of posts back, getting from a given unicode sequence to a particular glyph is "a process". I'm sure all will be golden with the new engine. But in this instance, not likely with a different font; which one could reasonably misinterpret "currently selected" in the previous post as implying. Bonne chance.

@stereokai
Copy link

stereokai commented Apr 21, 2018

Can you please share with us - because you were rather vague 3 weeks ago - will "the same Console, with significantly improved internals, and several bug fixes and improvements" support UTF-8? Or will you only start working on it after "18H2 (2018, 2nd half)", meaning we should gather more patience? Thank you very much, tons of kudos for your work!

@bitcrazed
Copy link
Contributor

All I can share right now is that we're working hard to make all the changes necessary to support UTF-8 which then enables us to work on adding rendering support for emoji, complex scripts, etc.

Not going to put dates on things until we're confident that a) things are working, b) we understand which releases our stuff lines up for.

It's a complex process, but bear with us - we're on it.

@BobFrankston
Copy link

My sympathy -- Unicode can get amazingly complex.

@stereokai
Copy link

Thanks a lot @bitcrazed

@bitcrazed
Copy link
Contributor

@BobFrGit .. and people wonder why I've got so much more gray hair these days ;)

getting old

@stereokai Thanks 😀

@bitcrazed
Copy link
Contributor

Hey all. Thanks for the discussion re. this issue. We're right in the middle of a ton of Console internals re-engineering that'll allow the Console to accurately support Unicode & UTF-8 text.

Closing this issue since:

  1. This work is underway
  2. This is the WSL issues repo, but this is an issue in Console which has its own Console GitHub Repo
  3. GitHub doesn't yet allow issues to be moved between repos, preserving posters' identity :(

If you have further asks/issues, please file new issues on our Console GitHub Repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests