# Encoding
Since computers can only understand 0 and 1, in order to show characters in computers, we need a way to map 0 and 1 into characters. 
___
# ASCII
`ASCII` (American Standard Code for Information Interchange) is a character encoding standard for electronic communication.

`ASCII` codes represent text in computers, telecommunications equipment, and other devices. Most modern character-encoding schemes are based on ASCII, although they support many additional characters.

`ASCII` uses 1 byte (8 bits) to do encoding, but actually only 7 bits are used. Therefore, `it encodes 128 (which is 2^7) different characters`, including both upper- and lower-case English letters, numbers from 0 to 9, and symbols such as !@#$%.

However, it doesn’t include characters from other languages, such as Chinese or Korean letters. 
Therefore, we need something more capable, something that can deal with more letters.
___



# Unicode 
`Unicode` is a set of characters (a superset of ASCII.) 

The standard, which the Unicode Consortium maintains, `defines 144,697 characters covering 159 modern and historic scripts, as well as symbols, emoji, and non-visual control and formatting codes.`
___
# UTF-8
Binary digits mapping to Unicode can be implemented in different ways. The most commonly used encoding is `UTF-8`. 
`UTF-8 is a variable-length encoding with a minimum of 8 bits per character`. Characters with higher code points will take up to 32 bits. `So UTF-8 can encode Unicode characters in a range of 1 to 4 bytes.`

Ideally, the maximum of 4 bytes should be able to map 2^32=4294967296 characters. However, the algorithm we need for `UTF8 encoding support a range of 1 to 4 bytes in memory` puts a limit to this number.

How does that work? Each byte starts with a few bits that tell you whether it's a single-byte code-point, a multi-byte code point, or a continuation of a multi-byte code point. For example, a single byte code-point is:
```
0xxx xxxx (A single-byte ASCII code forms the first 127 characters)
```
The multi-byte code-points each start with a few bits that essentially say, "hey, you need to also read the next byte (or two, or three) to figure out what I am." They are:
```
110x xxxx One more byte follows 
1110 xxxx Two more bytes follow 
1111 0xxx Three more bytes follow
```
Finally, the bytes that follow those start codes all look like this:
10xx xxxx (A continuation of one of the multi-byte characters)
___




# bytes 
bytes is a data structure used to store binary data, which can be understood as a sequence of bytes. `Each byte consists of 8 bits` (an integer ranging from 0 to 255). The values of the bytes type can typically be represented in multiple formats, such as binary or hexadecimal.

___

# str.encode(), byte.decode()
In Python 3, `a string is an immutable sequence of Unicode characters`. This is the true meaning behind each string. (That means we can put emojis in Python strings!!!)

Python supports changing Unicode characters to its bytes, and bytes to Unicode characters. The encoding could be utf-8 or utf-16, or something else.

* `string.encode(encoding)` returns the bytes that the string maps to based on the encoding.<br>
* `byte.decode(encoding)` returns the string that the byte maps to based on the encoding.


In [None]:
unicode_string = "租"
utf_bytes = unicode_string.encode("utf-8") # bytes
print(utf_bytes)

b'\xe7\xa7\x9f'


In [3]:
result_string = utf_bytes.decode("utf-8")
print(result_string)

租
