Converting from code points to bytes is encoding; converting from bytes to
code points is decoding. See Example 4-1.

Example 4-1. Encoding and decoding

In [5]:
s = 'cafeب'
len(s)

5

In [6]:
b = s.encode('utf8')
b

b'cafe\xd8\xa8'

In [7]:
b.decode('utf8')

'cafeب'

### Byte Essentials

Example 4-2. A five-byte sequence as bytes and as bytearray

In [8]:
caffe = bytes('cafe', encoding='utf_8')
caffe

b'cafe'

In [9]:
caffe[0]

99

In [10]:
caffe[:1]

b'c'

In [11]:
caffe_arr = bytearray(caffe)
caffe_arr

bytearray(b'cafe')

In [12]:
caffe_arr[-1:]

bytearray(b'e')

Example 4-3. Initializing bytes from the raw data of an array

In [13]:
import array

numbers = array.array('h', [-2, -1, 0, 1, 2])
octets = bytes(numbers)
octets

b'\xfe\xff\xff\xff\x00\x00\x01\x00\x02\x00'

Example 4-4. The string “El Niño” encoded with three codecs producing
very different byte sequences

In [1]:
for codec in ['latin_1', 'utf_8', 'utf16']:
    print(codec, 'El Niño'.encode(codec), sep='\t')

latin_1	b'El Ni\xf1o'
utf_8	b'El Ni\xc3\xb1o'
utf16	b'\xff\xfeE\x00l\x00 \x00N\x00i\x00\xf1\x00o\x00'


Example 4-5. Encoding to bytes: success and error handling

In [2]:
city = 'São Paulo'
city.encode('utf_8')


b'S\xc3\xa3o Paulo'

In [3]:
city.encode('utf_16')

b'\xff\xfeS\x00\xe3\x00o\x00 \x00P\x00a\x00u\x00l\x00o\x00'

In [4]:
city.encode('iso8859_1')

b'S\xe3o Paulo'

In [5]:
city.encode('cp437')

UnicodeEncodeError: 'charmap' codec can't encode character '\xe3' in position 1: character maps to <undefined>

In [6]:
city.encode('cp437', errors='ignore')

b'So Paulo'

Example 4-6 illustrates how using the wrong codec may produce gremlins
or a UnicodeDecodeError.

Example 4-6. Decoding from str to bytes: success and error handling

In [9]:
octets = b'Montr\xe9al'

In [10]:
octets.decode('cp1252')

'Montréal'

In [11]:
octets.decode('iso8859_7')

'Montrιal'

In [12]:
octets.decode('koi8_r')

'MontrИal'

In [13]:
octets.decode('utf_8', errors='replace')

'Montr�al'