Fallback for non-UTF8 characters does not work anymore #1797

jhkl · 2016-08-18T21:13:15Z

Since X-Chat times I have set the my character set in the server settings to "UTF-8". If a another user in a channel uses a different encoding I understand that this usually poses a problem. But in the past X-Chat & Hexchat were better in guessing what charcter is acutally displayed on my end.

In my concrete case the problem are german umlauts (äöü) and probably ISO-8859-15 or CP 1252. In the past these were more less always displayed correctly. Since some time I often see dummy characters � instead of the correct ones. I can't nail it to a certain Hexchat version, if required, I can investigate further with some old versions.

Is there some heuristic that was changed for a reason?

I am currently using a x86 HexChat 2.12.1 (not portable) on Windows 10 x64.

Example:

(top: current Hexchat, middle: old Xchat with UTF-8, bottom: old Xchat with CP1252)

TingPing · 2016-08-18T21:28:14Z

This is somewhat discussed in #1636 (Ignore a few of the unhelpful comments). The summary is that trying random encodings that might work is not a good solution, it caused a few issues that were fixed by this. etc, etc. Yes we know some users keep using awful encodings and its hard for people to agree on them but we think it is a better solution going forward to be more strict and do the 'correct' thing.

mk-pmb · 2018-08-10T13:05:37Z

to be more strict and do the 'correct' thing.

Currently this means a loss of information when I use HexChat, because all the different broken umlauts become the same replacement character. On FreeNode I can't even argue with the legacy charset users about the "correct thing" because the servers' charset recommendation in handshake is ASCII.
So the really "correct" thing to do would be I configure HexChat to use ASCII and then be annoyed by all strange UTF-8 pairs.
I asked in #freenode and got this recommendation:

<e> well, terrible and stupid though it is, the reality of IRC is that you need to decode as utf-8 and fall back on error to iso-8859-1, because otherwise you're not doing what everyone else expects […] i learned this the hard way, after years of having bots (and sometimes my own client) fail to decode stuff for seemingly no reason

Can we please have an option to adhere to the robustness principle?

TingPing closed this as completed Aug 18, 2016

TingPing added duplicate question labels Aug 18, 2016

This was referenced Nov 18, 2019

Enhancement: Raw __undecoded__ line access from scripts #1430

Open

Add option to log raw server communication #2397

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fallback for non-UTF8 characters does not work anymore #1797

Fallback for non-UTF8 characters does not work anymore #1797

jhkl commented Aug 18, 2016 •

edited

TingPing commented Aug 18, 2016

mk-pmb commented Aug 10, 2018 •

edited

Fallback for non-UTF8 characters does not work anymore #1797

Fallback for non-UTF8 characters does not work anymore #1797

Comments

jhkl commented Aug 18, 2016 • edited

TingPing commented Aug 18, 2016

mk-pmb commented Aug 10, 2018 • edited

jhkl commented Aug 18, 2016 •

edited

mk-pmb commented Aug 10, 2018 •

edited