Skip to content
This repository has been archived by the owner on Feb 10, 2024. It is now read-only.

Fallback for non-UTF8 characters does not work anymore #1797

Closed
jhkl opened this issue Aug 18, 2016 · 2 comments
Closed

Fallback for non-UTF8 characters does not work anymore #1797

jhkl opened this issue Aug 18, 2016 · 2 comments

Comments

@jhkl
Copy link

jhkl commented Aug 18, 2016

Since X-Chat times I have set the my character set in the server settings to "UTF-8". If a another user in a channel uses a different encoding I understand that this usually poses a problem. But in the past X-Chat & Hexchat were better in guessing what charcter is acutally displayed on my end.

In my concrete case the problem are german umlauts (äöü) and probably ISO-8859-15 or CP 1252. In the past these were more less always displayed correctly. Since some time I often see dummy characters � instead of the correct ones. I can't nail it to a certain Hexchat version, if required, I can investigate further with some old versions.

Is there some heuristic that was changed for a reason?

I am currently using a x86 HexChat 2.12.1 (not portable) on Windows 10 x64.

Example:
example
(top: current Hexchat, middle: old Xchat with UTF-8, bottom: old Xchat with CP1252)

@TingPing
Copy link
Member

This is somewhat discussed in #1636 (Ignore a few of the unhelpful comments). The summary is that trying random encodings that might work is not a good solution, it caused a few issues that were fixed by this. etc, etc. Yes we know some users keep using awful encodings and its hard for people to agree on them but we think it is a better solution going forward to be more strict and do the 'correct' thing.

@mk-pmb
Copy link

mk-pmb commented Aug 10, 2018

to be more strict and do the 'correct' thing.

Currently this means a loss of information when I use HexChat, because all the different broken umlauts become the same replacement character. On FreeNode I can't even argue with the legacy charset users about the "correct thing" because the servers' charset recommendation in handshake is ASCII.
So the really "correct" thing to do would be I configure HexChat to use ASCII and then be annoyed by all strange UTF-8 pairs.
I asked in #freenode and got this recommendation:

<e> well, terrible and stupid though it is, the reality of IRC is that you need to decode as utf-8 and fall back on error to iso-8859-1, because otherwise you're not doing what everyone else expects […] i learned this the hard way, after years of having bots (and sometimes my own client) fail to decode stuff for seemingly no reason

Can we please have an option to adhere to the robustness principle?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Development

No branches or pull requests

3 participants