Universal Encoding Detector currently supports over two dozen character encodings.
Big5
,GB2312
/GB18030
,EUC-TW
,HZ-GB-2312
, andISO-2022-CN
(Traditional and Simplified Chinese)EUC-JP
,SHIFT_JIS
, andISO-2022-JP
(Japanese)EUC-KR
andISO-2022-KR
(Korean)KOI8-R
,MacCyrillic
,IBM855
,IBM866
,ISO-8859-5
, andwindows-1251
(Russian)ISO-8859-2
andwindows-1250
(Hungarian)ISO-8859-5
andwindows-1251
(Bulgarian)ISO-8859-1
andwindows-1252
(Western European languages)ISO-8859-7
andwindows-1253
(Greek)ISO-8859-8
andwindows-1255
(Visual and Logical Hebrew)TIS-620
(Thai)UTF-32
BE, LE, 3412-ordered, or 2143-ordered (with a BOM)UTF-16
BE or LE (with a BOM)UTF-8
(with or without a BOM)- ASCII
Warning
Due to inherent similarities between certain encodings, some encodings may
be detected incorrectly. In my tests, the most problematic case was
Hungarian text encoded as ISO-8859-2
or windows-1250
(encoded as
one but reported as the other). Also, Greek text encoded as ISO-8859-7
was often mis-reported as ISO-8859-2
. Your mileage may vary.