Skip to content

Latest commit

 

History

History
129 lines (103 loc) · 11.3 KB

Encodings.md

File metadata and controls

129 lines (103 loc) · 11.3 KB

Encodings

  • What is Unicode?

  • What is Unicode Consortium?

  • What is American Standard Code for Information Interchange (ASCII)?

  • What is the difference between ASCII and US-ASCII?

  • What is the difference between ASCII and Extended ASCII?

  • What is American National Standards Institute (ANSI)?

  • What is the differece between ASCII and ANSI?

  • What is UCS-2?

  • What is UTF-8?

  • What is the max UTF-... available?

  • What is the difference between UTF-8 and Unicode?

  • What is the difference between UTF-8 and Extended ASCII?

  • What is Byte order mark (BOM)?

  • What is the difference between big-endian and little-endian?

  • What is the difference between line feed LF (0x0A) and \n, or between carriage return CR (0x0D) and \r ?

  • What is the difference between NEXT LINE (NEL) (U+0085), Line Separator (LS) (U+2028), and End Of Line (EOL)?

  • What is the default End Of Line (EOL) for each of the following: Windows, Linux, OSX, Unix, older Mac?

  • What is the meaning of full stop (0x2E)?

  • What is iconv?

  • Why do some of emails contain "J", for example "RegardsJ"?
    If you have Wingdings installed on your computer, the following character will appear as a smiley face. Otherwise, it will be the letter "J": J
    This is because the letter J represents a smiley face icon in the Wingdings font. Microsoft Outlook, a popular e-mail client, automatically converts the :) and :-) text emoticons into smiley face icons using the Wingdings font. Therefore, when Microsoft Outlook users type smiley faces in an e-mail message, they are sent as visual smiley face icons.
    Read more:
    https://pc.net/helpcenter/answers/letter_j_in_email_messages

MySQL

  • What is the difference between Collation and Character Set?
    Collation: A collation is a set of rules for comparing characters in a character set.
    Character-Set: A character set is a set of symbols and encoding, mostly this information is derived from the type of collation.
    Collation and Character-Set in MySQL are meant for strings.
    Read more:
    https://medium.com/@manish_demblani/breaking-out-from-the-mysql-character-set-hell-24c6a306e1e5
    https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/

  • How many levels of basic collation and character set exists in MySQL? Just note if collation and character set may be set specifically for a database, a table, a column.
    MySQL defines a basic collation and a character-set for each of its databases; Furthermore, each table created can have its own collation and character-set which can be same as that of the database or different from it. Furthermore, to provide even more flexibility, a column in a table has a collation and character-set of its own, which can be same as the table or different from it. Although this gives a lot of flexibility but also increases the complexity of handling data.
    Read more:
    https://medium.com/@manish_demblani/breaking-out-from-the-mysql-character-set-hell-24c6a306e1e5

  • Lets say we have MySQL database that supports UTF-8 (utf8mb3 by default, utf8mb4 if modified). There is a type of data (a data container) for a text to be stored in - TINYTEXT. TINYTEXT is set to be 255 bytes in size. How many utf8mb3 and utf8mb4 characters can TINYTEXT store?
    ???
    utf8mb3 - 255 bytes / 3 characters per byte -> 85 characters
    utf8mb4 - 255 bytes / 4 characters per byte -> 63 characters
    Read more:
    https://mathiasbynens.be/notes/mysql-utf8mb4

  • What is the difference between utf8_unicode_ci and utf8_general_ci?
    In general, utf8_general_ci is faster than utf8_unicode_ci, but less correct.
    For any Unicode character set, operations performed using the _general_ci collation are faster than those for the _unicode_ci collation. For example, comparisons for the utf8_general_ci collation are faster, but slightly less correct, than comparisons for utf8_unicode_ci. The reason for this is that utf8_unicode_ci supports mappings such as expansions; that is, when one character compares as equal to combinations of other characters. For example, in German and some other languages ß is equal to ss. utf8_unicode_ci also supports contractions and ignorable characters. utf8_general_ci is a legacy collation that does not support expansions, contractions, or ignorable characters. It can make only one-to-one comparisons between characters.
    Read more:
    https://stackoverflow.com/questions/2344118/utf-8-general-bin-unicode/2344130#2344130
    https://dev.mysql.com/doc/refman/8.0/en/charset-unicode-sets.html

  • What is the difference between utf8_bin and utf8_general_ci?
    utf8_bin compares the bits blindly. No case folding, no accent stripping.
    utf8_general_ci compares one byte with one byte. It does case folding and accent stripping, but no 2-character comparisions: ij is not equal ij in this collation.
    Read more:
    https://stackoverflow.com/questions/2344118/utf-8-general-bin-unicode/2344130